Introduction

2020 Exploratory Testing Survey contains responses from 3080 adult citizens across the United States to questions about the 2020 Election. The survey data was collected in April 2020, a half year before the Election Day. Since the 2020 election admitted more mail-in ballots than usual, a series of problems towards the election integrity and fraud arised and caused a massive post-election ‘fradulent mess’.

In the questionaire, questions towards vote in-person vs. vote by mail have been addressed. After analyzing the survey data, I’d like to answer the following three questions:

Some basic summaries of the raw dataset: + There are 3080 respondents + There are 470 variables

Each column correponds to one question in the survey. To find the pdf version of the questionaire: https://electionstudies.org/data-center/2020-exploratory-testing-survey/

Though many questions have been addressed in the questionaire, in this project, I will only focus on four of them. I processed the raw dataset and extract the corresponding four columns, and saved the new dataset in data file called “anes_data.csv”.

Question 1: Is mail-in ballot a potential cause for the ‘fradulent mess’?

In the ANES pilot 2020 questionaire, under ELECTORAL INTEGRITY section, respondents are asked whether they favor or oppose mail-in ballots and how accurate do they think the votes will be.

Let’s take a look at the answers to votemail1a and votemail1b:

##  Q1a 
## <NA> 
## Levels: c(4, 5, 6, 1, 7, 2, 3, 88)
##  Q1b 
## <NA> 
## Levels: c(77, 6, 1, 3, 4, 5, 2, 7)

The survey has been generated randomly by two different forms. In this question, people who received Form 2 have their answer in ‘88’ under ‘votemail1a’ column. Records correpond to row #1500 to #3080 are answered by ‘88’. Similarly, under ‘votemail1b’ column, row #1 to #1499 are answered by ‘77’. Combine the two questions into one column named ‘votemail’ and plot the distribution of answers out.

Obviously from the above plot, more people favor mail-in ballots than people who oppose.
Do these people who favor mail-in ballots really trust the accuracy of the election result?

Among those who favor mail-in ballots, how much do they trust the accuracy of the counting process?

Among those who favor mail-in ballots, there is still 1/6 people who do not trust the voting would be accurate. Let alone the population who oppose mail-in ballots. Thus we can conclude that the mail-in ballot is at least a potential cause for the ‘fradulent mess’.

Question 2: Would Joe Biden win/lose a lot based on the survey data? Did the survey reflect the reality of 2020 election result? If not, what are the potential reasons?

# number of respondents who will vote for Donald Trump
DT <- sum(anes_data$vote20jb == 1)
# number of respondents who will vote for Joe Biden
JB <- sum(anes_data$vote20jb == 2)
# percentage of Joe Biden's votes more than Donald Trump's
percentJB <- (JB-DT)/DT

percent_real <- (81281888-74223251)/74223251

From the survey data, Joe Biden will win 5.4140127 % more than Donald Trump. From the final 2020 election result, Joe Biden won 81,281,888 votes, Donald Trump won 74,223,251 votes. Joe won 9.5100079 % more than Donald Trump. The survey data did not reflect the true result of the final election result.

Some potential causes:

Question 3: Do Democrats favor voting by mail more than Republicans?

From previous two questions, do Democrats (or people who vote for Biden) favor voting in-mail more than Republicans (or people who vote for Trump)?

I would like to categorize data into four groups:

anes_data["Demo_Rep"] = ''

anes_data[((anes_data['votemail1a']==1) |
                        (anes_data['votemail1a']==2) |
                        (anes_data['votemail1a']==3))&
                        (anes_data['vote20jb']==2), "Demo_Rep"] = 'Democrats favor mail-in' 

anes_data[((anes_data['votemail1a']==5) |
                        (anes_data['votemail1a']==6) |
                        (anes_data['votemail1a']==7))&
                        (anes_data['vote20jb']==2), "Demo_Rep"] = 'Democrats oppose mail-in' 

anes_data[((anes_data['votemail1a']==1) |
                        (anes_data['votemail1a']==2) |
                        (anes_data['votemail1a']==3))&
                        (anes_data['vote20jb']==1), "Demo_Rep"] = 'Republicans favor mail-in' 

anes_data[((anes_data['votemail1a']==5) |
                        (anes_data['votemail1a']==6) |
                        (anes_data['votemail1a']==7))&
                        (anes_data['vote20jb']==1), "Demo_Rep"] = 'Republicans oppose mail-in'

anes_data[anes_data$Demo_Rep == "", "Demo_Rep"] <- NA

Democrats favor mail-in ballots more than Republicans, around half Republicans favor mail-in ballots while also half of them oppose mail-in.

Conclusion

Due to the pandemic, a lot of people favor mail-in ballots. But this also increases the risk of non-accurate counting process. And people who vote for the Democrat tend to favor mail-in ballots more than people who support the Republican. This could also explain what happened after the election in Washington D.C.